On the Whittle index of Markov modulated restless bandits
نویسندگان
چکیده
In this paper, we study a Multi-Armed Restless Bandit Problem (MARBP) subject to time fluctuations. This model has numerous applications in practice, like cloud computing systems or wireless communications networks. Each bandit is formed by two processes: controllable process and an environment. The transition rates of the are determined state environment, which exogenous Markov process. decision maker full information on every bandit, objective determine optimal policy that minimises long-run average cost. Given complexity problem, set out characterise Whittle index, obtained solving relaxed version MARBP. As reported literature, heuristic performs extremely well for wide variety problems. Assuming problem threshold type, provide algorithm finds Whittle’s index. We then consider multi-class queue with linear cost impatient customers. For model, show optimality, prove indexability, obtain index closed-form. also limiting regimes environment relatively slower faster than By numerical simulations, assess suboptimality scenarios, general observation that, as case standard MARBP, gap small.
منابع مشابه
On the Whittle Index for Restless Multi-armed Hidden Markov Bandits
We consider a restless multi-armed bandit in which each arm can be in one of two states. When an arm is sampled, the state of the arm is not available to the sampler. Instead, a binary signal with a known randomness that depends on the state of the arm is available. No signal is available if the arm is not sampled. An arm-dependent reward is accrued from each sampling. In each time step, each a...
متن کاملOn an Index Policy for Restless Bandits
We investigate the optimal allocation of effort to a collection of n projects. The projects are 'restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average ...
متن کاملRegret Bounds for Restless Markov Bandits
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner’s actions. We suggest an algorithm that after T steps achieves Õ( √ T ) regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we sho...
متن کاملIndex Policies for a Class of Discounted Restless Bandits
The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong p...
متن کاملMulti - armed restless bandits , index policies , and dynamic priority allocation
This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Queueing Systems
سال: 2022
ISSN: ['1572-9443', '0257-0130']
DOI: https://doi.org/10.1007/s11134-022-09737-y